Method and system for efficient cache locking mechanism

ABSTRACT

Systems and methods for the implementation of more efficient cache locking mechanisms are disclosed. These systems and methods may alleviate the need to present both a virtual address (VA) and a physical address (PA) to a cache mechanism. A translation table is utilized to store both the address and the locking information associated with a virtual address, and this locking information is passed to the cache along with the address of the data. The cache can then lock data based on this information. Additionally, this locking information may be used to override the replacement mechanism used with the cache, thus keeping locked data in the cache. The translation table may also store translation table lock information such that entries in the translation table are locked as well.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a divisional of, and claims a benefit of priorityunder 35 U.S.C. 120 of the filing date of U.S. patent application Ser.No. 11/145,844 by inventors Takeki Osanai and Kimberly Fernsler entitled“Method and System for Efficient Cache Locking Mechanism” filed on Jun.6, 2005, the entire contents of which are hereby expressly incorporatedby reference for all purposes.

TECHNICAL FIELD OF THE INVENTION

The invention relates in general to methods and systems for cachemanagement, and more particularly, to efficient implementations of cachelocking.

BACKGROUND OF THE INVENTION

In recent years, there has been an insatiable desire for faster computerprocessing data throughputs because cutting-edge computer applicationsare becoming more and more complex, placing ever increasing demands onmicroprocessing systems. The microprocessors in these systems may havevery rapid cycle times and be capable of manipulating a great amount ofdata very quickly. The time to access the DRAM memories to which thesemicroprocessors are coupled, however, may be considerably higher thanthe cycle time of the microprocessor and can vary dramatically based onthe extant conditions at the time of the memory access.

In order to ameliorate the bottleneck imposed by the relatively long andvariable access time to memory, memory hierarchies utilizing cachememories have been implemented in conjunction with microprocessors.Cache memory augments the data storage function of main memory byproviding data storage that is significantly faster than DRAM memory andwhich provides consistent access times.

Due to the relatively high cost of cache memories, however, they aretypically much smaller than main memory. Consequently, conventionalreplacement algorithms have been employed to determine what data shouldbe stored in the cache memory. Most of these algorithms fill and replaceelements within the cache according to some fixed policy, such that datais rotated in and out of the cache based on this policy.

Occasionally, however, programmers who design applications for thesemicroprocessor systems wish certain critical memory contents to remainin the cache in order to guarantee fixed cycles of latencies to accessthese critical memory contents. Cache locking allows some or all of thecontents of the cache to be locked in place, unsusceptible to the cachereplacement policy implemented on the system. This ability to lock thecache is available on several microprocessors, such as the PowerPC, someIntel ×86 processors the Motorola MPC7400 etc., and may allow staticlocking of the cache (cache is loaded and locked at system start) anddynamic locking (the state of the cache may change during execution).While cache locking may decrease the performance of the cache, it allowsprogrammers to more accurately predict a worse case access time forapiece of data; particularly important in designing mission criticalsystems.

Typically, however, the systems and methods utilized to lock the cachemay require a large overhead. For example, in one implementation, tolock data elements within the cache a programmer may set the effectiveaddress of the data to be locked to a first register for managing alocked cache, and the set information for the set (way) of the cache tobe locked to a second register for managing a locked cache. The firstaccess to the effective address (or the virtual address) located in thefirst register may generate a cache reload to the set (way) of the cachepointed to by the second register. Subsequently, however, the hardwarewill not replace the cache entry referenced by the second register withother data whose address is different from contents of the secondregister. Thus, the critical data remains in the cache.

This technique may require that both the effective address and the realaddress (or the physical address) of data be maintained by the load andstore queues of the cache which in turn imposes a heavy hardwarepenalty. In one implementation, those pair of registers to manage lockaddresses is established in an L2 cache unit. For example, if aneffective address is fifty-two bits long, the load queue contains fourentries and the store queue is eight entries long, to implement thistype of cache locking mechanism requires somewhere on the order of 624bits. The extra flip-flops required to store these bits may occupy arelatively large area on a modern microprocessor.

Thus, a need exists for efficient systems and methods for cache lockingmechanisms which reduce the overhead associated with implementing thiscache locking.

SUMMARY OF THE INVENTION

Systems and methods for the implementation of more efficient cachelocking mechanisms are disclosed. These systems and methods mayalleviate the need to present both a virtual address (VA) and a physicaladdress (PA) to a cache mechanism. By eliminating the need for both ofthese addresses, the hardware requirements needed to implement a cachelocking mechanism are eased by reducing the amount of hardware utilizedto store address data in various steps of the cache pipeline. Atranslation table is utilized to store both the address and the lockinginformation associated with a virtual address, and this lockinginformation is passed to the cache along with the address of the data.The cache can then lock data based on this information. Additionally,this locking information maybe used to override the replacementmechanism used with the cache, thus keeping locked data in the cache.The translation table may also store translation table lock informationsuch that entries in the translation table are locked as well.

In one embodiment a translation table is operable to store cache lockinformation corresponding with one or more of the entries in thetranslation table.

In another embodiment, this cache lock information maybe used tooverride the replacement policy of a cache containing data associatedwith one of the entries.

In still another embodiment, the entries of the translation table maythemselves be locked within the translation table.

Embodiments of the present invention may provide the technical advantageof reducing the amount of hardware, flip-flops and/or other logic neededto implement cache locking and make implementations of cache lockingsubstantially faster.

These, and other, aspects of the invention will be better appreciatedand understood when considered in conjunction with the followingdescription and the accompanying drawings. The following description,while indicating various embodiments of the invention and numerousspecific details thereof, is given by way of illustration and not oflimitation. Many substitutions, modifications, additions orrearrangements may be made within the scope of the invention, and theinvention includes all such substitutions, modifications, additions orrearrangements.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings accompanying and forming part of this specification areincluded to depict certain aspects of the invention. A clearerimpression of the invention, and of the components and operation ofsystems provided with the invention, will become more readily apparentby referring to the exemplary, and therefore nonlimiting, embodimentsillustrated in the drawings, wherein identical reference numeralsdesignate the same components. Note that the features illustrated in thedrawings are not necessarily drawn to scale.

FIG. 1 depicts a block diagram of one embodiment of a system for lockingentries in a cache.

FIG. 2 depicts a block diagram of one embodiment of a translation table.

FIG. 3 depicts a block diagram of one embodiment of a cache mechanism.

FIG. 4 depicts a block diagram of one embodiment of a system for lockingentries in a cache.

FIG. 5 depicts a block diagram of one embodiment of a translation table.

FIG. 6 depicts a block diagram of one embodiment of a translation table.

FIG. 7 depicts a block diagram of one embodiment of a cache mechanism.

DESCRIPTION OF PREFERRED EMBODIMENTS

The invention and the various features and advantageous details thereofare explained more fully with reference to the nonlimiting embodimentsthat are illustrated in the accompanying drawings and detailed in thefollowing description. Descriptions of well known starting materials,processing techniques, components and equipment are omitted so as not tounnecessarily obscure the invention in detail. Skilled artisans shouldunderstand, however, that the detailed description and the specificexamples, while disclosing preferred embodiments of the invention, aregiven by way of illustration only and not by way of limitation. Varioussubstitutions, modifications, additions or rearrangements within thescope of the underlying inventive concept(s) will become apparent tothose skilled in the art after reading this disclosure.

Initially, a few terms are defined or clarified to aid in anunderstanding of the terms as used throughout the specification. Theterm “translation table” is intended to mean any software, hardware orcombination which supports the ability to translate a virtual addressinto a physical address, such as a translation lookaside buffer (TLB),effective to real address translation table (ERAT) etc. By the sametoken, the term “virtual address” and “physical address” will beunderstood generically to refer to any type of virtual address andphysical addresses, no matter the specific terms used with reference toa particular architecture, for example “effective address” etc.Conversely, these specific terms will be understood to be specificexamples of the generic term. For example, an effective address will bea specific example of a virtual address. Additionally, these terms willbe used generically no matter what size block the addresses are used torefer to, whether they be individual memory locations, word size, doubleword size, page size etc.

Reference is now made in detail to the exemplary embodiments of theinvention, examples of which are illustrated in the accompanyingdrawings. Wherever possible, the same reference numbers will be usedthroughout the drawings to refer to the same or like parts (elements).

Before discussing embodiments of the present invention, an exemplaryarchitecture for use in illustrating embodiments of the presentinvention is described. It will be apparent to those of ordinary skillin the art that this is a simple architecture intended for illustrativeembodiments only, and that the systems and methods described herein maybe employed with any variety of more complicated or simplerarchitectures in a wide variety of microprocessing systems. For example,though the systems and methods of the present invention may be describedwith respect to a level 2 (L2) data cache the same systems and methodsare equally applicable to a level 1 (L1) cache, etc.

It will also be apparent that though the terminology used may bespecific to a particular microprocessor architecture, the functionalityreferred to with this terminology may be substantially similar to thefunctionality in other microprocessor architectures. For example, aneffective to real address translation table (ERAT) may be substantiallyequivalent to a translation lookaside buffer (TLB) or other virtualaddress to physical address translation or virtual page to real pagetranslation mechanism.

FIG. 1 illustrates just such an exemplary architecture.

During an execute stage of a processor, arithmetic and operand addressgeneration (agen) logic 110 may generate an effective address (EA)corresponding to a virtual location in memory where data related to aninstruction is located, as is known in the art. This EA may be sent toL1 cache 120, tag generator 130, and translation table 140. L1 cache 120may be a virtually indexed, physically tagged cache as is known in theart, such that the EA may be compared against entries in L1 cache 120 todetermine if an entry in L1 cache 120 corresponds to the EA. Translationtable 140 is capable of translating effective addresses into realaddresses, as is known in the art and may be implemented in a widevariety of ways, such as translation lookaside buffers (TLBs), ERATs,etc.

Substantially simultaneously to the EA being sent to L1 cache 120, theEA may be placed into a cache line miss buffer of cache miss transactionpipeline 152; translation table 140 translates the EA into acorresponding real address (RA); and tag bits corresponding to the EAare generated by tag generator 130 if they are available. Turningbriefly to FIG. 2, one embodiment of a typical architecture for atranslation table is depicted. Translation table 140 comprises storagearea 210 which may be a content addressable memory capable of storingphysical addresses 214 and corresponding attributes 212. Whentranslation table 140 receives an EA on input lines 250, this EA may beused as a search key to storage area 210. If a match is found to the EA,RA 214 and attribute bits 212 corresponding to that EA are output onoutput lines 260.

Returning to FIG. 1, the RA 214 generated by translation table 140 isthen stored in cache transaction pipeline 152 along with itscorresponding EA. Entries in cache transaction pipeline 152 can then beplaced in load/miss queue 170 or store queue 180 depending on the typeof instruction which originally referenced EA. From load/miss queue 170or store queue 180, an EA and a corresponding RA may be sent to L2 cachemechanism 190.

FIG. 3 depicts one embodiment of an architecture for L2 cache mechanism190. L2 cache mechanism 190 comprises L2 cache 300, lock informationstorage 310 and L2 cache refill mechanism 320. L2 cache 300 may storedata; lock information mechanism 310 may store lock informationpertaining to data within L2 cache 300 and L2 cache refill mechanism 320may implement a replacement policy for L2 cache 300. Thus, cachemechanism 190 may implement a L2 cache capable of locking data in thecache. In one embodiment, lock information mechanism 310 may function byutilizing an EA sent from load/miss queue 170 or store queue 180, whileL2 cache 300 may function based on the associated RA. When data isstored in L2 cache 300, if this data is to be locked, associated lockinformation is stored in lock information mechanism 310. When L2 cacherefill mechanism 310 wishes to replace an entry in L2 cache 300 itchecks lock information storage 310 for information pertaining to theentry it wishes to replace. If this entry is locked, L2 cache refillmechanism 320 selects a different entry of L2 cache 300 to replace.

As can be seen from FIGS. 1-3, certain implementations of a cachelocking mechanism may necessitate that both the EA and RA of data bestored in entries of cache miss transaction pipeline 152, load/missqueue 170 and store queue 180 such that both the EA and the RA of datacan be made available to L2 cache mechanism 190. Storing both the EA andRA, in turn, imposes heavy hardware requirements on a microprocessor.

Attention is now directed to systems and methods for the implementationof more efficient cache locking mechanisms. These systems and methodsmay alleviate the need to present both a virtual address (VA) (or aneffective address (EA)) and a physical address (PA) (or a real address(RA)) to a cache mechanism. By eliminating the need for both of theseaddresses, the hardware requirements needed to implement a cache lockingmechanism are eased by reducing the hardware utilized to store addressdata in various steps of the cache pipeline. A translation table isutilized to store both the address and the locking informationassociated with a virtual address, and this locking information ispassed to the cache along with the address of the data. The cache canthen lock data based on this information. Additionally, this lockinginformation may be used to override the replacement mechanism used withthe cache, thus keeping locked data in the cache. The translation tablemay also store translation table lock information such that entries inthe translation table are locked as well.

FIG. 4 illustrates one embodiment of just such an architecture. Duringan execute stage of a processor, arithmetic and operand addressgeneration (agen) logic 410 may generate an effective address (EA)corresponding to a virtual location in memory where data related to aninstruction is located, as is known in the art. This EA may be sent toL1 cache 420, tag generator 430, and translation table 440. L1 cache 420may be a virtual indexed cache as is known in the art, such that the EAmay be looked up in the L1 cache to determine if an entry in L1 cache420 corresponds to EA.

Substantially simultaneously to the EA being sent to L1 cache 420, theEA may be placed into a cache line miss buffer of cache miss transactionpipeline 452; translation table 440 translates the EA into acorresponding physical address (PA) and lock data; and tag bitscorresponding to the EA are generated by tag generator 430 if they areavailable.

Turning briefly to FIG. 5, one embodiment of an architecture for atranslation table is depicted. Translation table 440 maybe a contentaddressable memory comprising a set of entries 550. Each entry 550 maycontain a real addresses 552 and corresponding attributes 554.Attributes 554, in turn, may contain cache lock information 556,comprising bits pertaining to the locking of data corresponding to realaddress 552 within a cache. When translation table 440 receives an EA oninput line 550, this EA may be used as a search key to storage area 510.If a match is found to the EA, the RA 552, attribute bits 554, includingcache lock information 556, corresponding to EA are output on ERAToutput lines 560.

In one particular embodiment, to lock information corresponding to an EAin a cache an instruction may identify an EA, and that datacorresponding with this EA should be locked. This instruction may be aninstruction currently within the lexicon of a microprocessorarchitecture or maybe a new or modified instruction. For example, thedefinition and functionality of the “mtspr” instruction in the PowerPCarchitecture maybe enhanced, or the “mtc0” instruction in the MIPSarchitecture. Additionally, this instruction may perform a “sync”function, as is known in the art, such that the context of the system issynchronized before the data is locked. This instruction may also beoperable to write to, modify, or set one or more configurationregisters, or read the lock information corresponding to an EA.

Upon receiving the EA and cache lock information from the instruction,translation table 440 may look at all of its valid entries 550 for amatch to the EA and set lock information 556 of matching entry 550 if amatch is found such that data corresponding with that entry 550 will belocked in L2 cache mechanism 490.

If no match is found to the EA, at least a portion of translation table440 is reloaded. Upon reloading translation table 440, cache lockinformation 556 of reloaded entry 550 matching the EA is set based onthe instruction. Alternatively, translation table 440 may automaticallyreload every entry of translation table 440 if no match is found to theEA. This automatic reload may, in turn, be initiated by an exceptionthat is taken when no match to the EA provided by the instruction isfound in translation table 440.

Lock information 556 may be a series of bits designating whether thedata corresponding to the address is to be locked. In one particularembodiment, lock information 556 may be three bits which may designatewhich way of an 8-way set associative cache is to be locked.

As particular entries 550 of translation table 440 correspond toinformation locked within a cache, it is expected that this informationwill be accessed on a fairly regular basis. Consequently, it may beinefficient for entries 550 corresponding to this locked information tobe overwritten or replaced with another pair of EA and RA in translationtable 440. Therefore, in one particular embodiment, entries 550 alsocontain one or more translation table lock bits to store translationtable locking information pertaining to locking entries 550 withintranslation table 440. These bits allow translation table 440 to lockentries 550 within translation table 440. In one embodiment, translationtable 440 may lock an entry 550 containing the RA and attribute bitscorresponding to the EA received from the instruction if the instructionindicates that EA is to be locked in the cache by setting thesetranslation table lock bits. Thus, that entry 550 is not replaced bysubsequent reloads of translation table 440 and subsequent accesses tothe identical EA will result in line 550 being output by translationtable 440 without translation table 440 having to reload any data.Conversely, when this entry 550 is invalidated it may be unlocked.

Though each entry 550 of translation table 440 may only utilize threebits per entry to store cache locking information, even this smallhardware addition to translation table 440 may be deemed too costly.However, in many cases very little data is locked in a cache.Correspondingly, then, there is less of a need to store this cachelocking information in translation table 440, and further reductions inhardware requirements may be achieved by devoting only certain lineswithin translation table 440 to storing the real address and attributesof data locked in a cache.

FIG. 6 illustrates one embodiment of such a translation table.Translation table 440 may be a content addressable memory comprising aset of entries 650, 652 capable of storing data. Each entry 650 maycontain a real addresses 654 and corresponding attributes 656. Entry 652also contains real address 654 and corresponding attributes 656.However, in entries 652, attributes 656, in turn, contain lockinformation 658, comprising bits pertaining to the locking of datacorresponding to real address 654 of entry 652.

Thus, upon first receiving an EA and cache lock information from aninstruction, translation table 440 loads information pertaining to thisEA into entry 652. If an entry corresponding to the EA already exists inone of the other entries 650 of translation table 440, the informationin this entry may be moved or copied to line 652. Additionally, uponloading line 652 with information, this entry 652 may be locked withintranslation table 440, as described above. Subsequently, whentranslation table 440 receives EA on input line the information fromline 552 is output on output line.

It will be understood by those of skill in the art that similarly to theembodiment described with respect to FIG. 5, if no match is found to theEA, at least a portion of translation table 440 may be reloaded, withthe RA corresponding to the EA loaded into entry 652. Upon reloadingtranslation table 440, cache lock information 658 of reloaded entry 652matching EA is set such that data corresponding to the RA loaded intoentry 652 will be locked in a cache. Alternatively, translation table440 may automatically reload every line in translation table 440 if nomatch is found to the EA. This automatic reload may, in turn, beinitiated by an exception that is taken when no match to the EA providedby the instruction is found in translation table 440.

Returning now to FIG. 4, the RA generated by translation table 440 isthen stored in cache miss transaction pipeline 452 along with itscorresponding EA. Information from entries in cache miss transactionpipeline 452 can then be placed in load/miss queue 470 or store queue480 of interface queues 484 (depending on the type of instruction whichoriginally referenced the EA) to be processed by L2 cache mechanism 490.Entries within load/miss queue 470 and entries in store queue 480consist only of an RA and lock information corresponding to the RA. Fromload/miss queue 470 or store queue 480, an RA and corresponding lockinformation may be sent to L2 cache mechanism 490.

FIG. 7 depicts one embodiment of an architecture for L2 cache mechanism490. L2 cache mechanism 490 comprises L2 cache 700 and L2 cacherefill/miss mechanism 720. L2 cache 700 may store data and L2 cacherefill/miss mechanism 720 may implement a replacement policy for L2cache 700. Thus, cache mechanism 490 may implement an L2 cache capableof locking data. If L2 cache mechanism 490 receives a real address fromstore queue 480, data corresponding to this real address may be storedin L2 cache 700. If the lock information indicates that this data is tobe locked in L2 cache 700 this lock information may be stored in L2cache refill mechanism 720. Conversely, if L2 cache mechanism 490receives an RA and associated lock information from load/miss queue 470,L2 cache 700 may be checked for data corresponding to the RA and if datais found the data can be returned to the requesting instruction. Ifmatching data is not found in cache 700, data corresponding to the realaddress can be loaded into cache 700 from main memory by L2 cache refillmechanism 720. When this data is loaded into L2 cache 700 if thereceived lock information indicates that data associated with the RA isto be locked this lock information may be stored in L2 cache refill/missmechanism 720.

In most cases, cache 700 is full, consequently to load datacorresponding with the RA into the cache L2 cache replacement/missmechanism 720 may implement a replacement policy to determine whichentry in cache 700 to replace with the data to be loaded. Cache refillmechanism 720 decides on an entry within cache 700 to be replaced. Cacherefill mechanism/miss 720 can then check lock information associatedwith the selected entry. If the lock information of the selected entryindicates that the entry is to remain locked, cache refill/missmechanism 720 may select another entry within cache 700 to be replaced,thus lock information associated with a cache entry can override thereplacement policy of cache refill/miss mechanism 720, whether thatreplacement policy be least recently used (LRU), most recently used(MRU) etc.

In the foregoing specification, the invention has been described withreference to specific embodiments. However, one of ordinary skill in theart appreciates that various modifications and changes can be madewithout departing from the scope of the invention as set forth in theclaims below. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope ofinvention.

Benefits, other advantages, and solutions to problems have beendescribed above with regard to specific embodiments.

However, the benefits, advantages, solutions to problems, and anycomponent(s) that may cause any benefit, advantage, or solution to occuror become more pronounced are not to be construed as a critical,required, or essential feature or component of any or all the claims.

1. A method for efficient cache locking, comprising: providing alocation for cache lock information in a translation table, wherein thelocation corresponds to an entry in a set of entries in the translationtable; and transmitting the cache lock information to a cache when avirtual address matches with the entry in the set of entries.
 2. Themethod of claim 1, comprising providing a set of locations for cachelock information in the translation table, wherein each of the locationscorresponds to an entry in the set of entries in the translation table.3. The method of claim 2, comprising issuing an instruction referencinga virtual address
 4. The method of claim 3, comprising synchronizing thecontext in response to the instruction.
 5. The method of claim 4,comprising setting the cache lock information in the locationcorresponding to one entry in the set of entries in the translationtable such that data corresponding to the one entry is locked in a cacheif the virtual address corresponds to the one entry.
 6. The method ofclaim 5, wherein locking the data in the cache comprises overriding thereplacement policy of the cache based on the cache lock information. 7.The method of claim 6, wherein the policy is a least recently used (LRU)policy.
 8. The method of claim 6, comprising setting translation tablelock information corresponding to the one of the set of the entries suchthat the entry is locked in the translation table.
 9. The method ofclaim 8, comprising setting the translation table lock informationcorresponding to the one of the set of entries such that the entry isunlocked in the translation table if the entry is invalidated.
 10. Themethod of claim 3, comprising refilling the translation table if thevirtual address does not correspond to any of the set of entries. 11.The method of claim 3, comprising generating an exception if the virtualaddress does not correspond to any of the set of entries.